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STOCHASTIC  MODELS  IN  A  FREE-RECALL  EXPERIMENT 

1.  Introduction 

The  purpose  of  this  paper  is  to  show  some  of  the  stochas- 
tic models  used  to  represent  the  data  of  a  free-recall  experi- 
ment often  done  in  psychology.   The  use  of  stochastic  models  is 
relatively  new.   Most  of  the  work  has  been  done  since  1950. 

This  paper  will  consider  four  stochastic  models.   Choice  of 
a  model  will  depend  on  assumptions  and  experimental  procedures. 
It  is  a  well  known  fact  in  psychology  that  two  experiments  on  the 
same  variable  may  not  yield  the  same  results  and  conclusions. 
Two  experimenters  studying  the  same  variable  may  vary  the  other 
variables  in  their  experiment  differently.   So  before  a  stochas- 
tic model  can  be  used  the  experimental  procedure  must  be  speci- 
fied.  The  free  recall  procedure  for  the  first  three  models  con- 
sidered in  this  paper  the  subject  is  allowed  as  many  trials  as  he 
needs  to  learn  completely  the  list  of  words.   In  the  last  model 
considered  the  subject  may  only  partially  learn  the  list  of  words 

A  stochastic  model  provides  a  framework  for  analyzing  data 
at  the  level  of  single  subjects  and  single  trials.  Models  also 
provide  a  way  to  summarize  the  data  once  various  parameters  are 
estimated.  For  a  general  discussion  of  the  value  of  stochastic 
models  see  Miller  (1964) . 


The  first  model  considered  is  a  model  by  Bush  and  Mosteller 
(1955) .   They  assumed  that  if  a  word  is  recalled  on  a  trial  the 
probability  of  recalling  the  word  on  the  next  trial  is  increased, 
and  this  change  could  be  represented  by  linear  operators. 

The  second  model  is  the  Miller  and  McGill  (1952)  model. 
Their  model  is  closely  related  to  Bush  and  Hosteller's  model. 
Use  of  some  results  from  Bush  and  Hosteller's  model  in  Hiller  and 
HcGill's  model  yields  several  interesting  results. 

The  third  model  considered  is  a  model  by  Waugh  and  Smith 
(1962).   Their  model  is  a  Harkov  chain  with  an  absorption  state. 
They  look  at  the  words  in  terms  of  what  state  they  are  in. 

The  last  model  considered  is  the  model  by  Cowan  (1966). 
This  model  is  not  like  the  previous  models.   Cowan's  model  con- 
siders the  fact  that  certain  words  tend  to  cluster  together  in 
recall. 

2.  Description  of  a  Free-Recall  Experiment 

A  list  of  nonsense  syllables  or  monosyllabic  words  is  pre- 
sented to  a  subject.   At  the  end  of  the  presentation  the  subject 
writes  down  all  the  words  that  he  can  recall.   The  order  of  the 
words  is  then  randomized,  and  the  procedure  is  repeated  until  the 
list  is  completely  learned. 

3.  The  Bush-Hosteller  Linear  Hodel 


In  developing  their  model  Bush  and  Hosteller  were  influ- 
enced by  a  paper  written  by  Estes  (1950).   The  authors  first 


described  in  two  papers,  (1951)  and  (1953),  the  basic  structure 
of  the  linear  model.   Since  then  they  have  published  many  other 
articles  and  books  on  their  model,  most  of  which  are  listed  in 
Atkinson,  Bower,  and  Crothers  (1965). 

3.1  Definitions  and  Terms. 

Let  p  be  the  probability  of  a  word  being  recalled,  and  q  be 
the  probability  of  that  word  not  being  recalled.   These  are  two 
mutually  exclusive  events.   Either  the  word  is  recalled,  denoted 
by  E^ ,  or  the  word  is  not  recalled,  denoted  by  E2.   It  is  assumed 
that  whenever  either  of  the  two  events  occurs  the  probabilities 
of  recall  or  non-recall  are  altered.   So,  corresponding  to  each 
event  there  is  a  mathematical  operator  T^  (i=l,2)  which  when  ap- 
plied to  the  probabilities,  transforms  the  probabilities  to  the 
probabilities  of  recall  or  non-recall  on  the  succeeding  trial. 
Bush  and  Hosteller  (1951)  considered  the  case  where  the  operation 
Op  was  expressible  as  a  power  series  in  p.   They  considered  the 
function  Tp  =  af^+a-p  as  an  approximation  to  the  function  Op.   Since 
Tp  was  a  linear  function  of  p,  then  matrix  operators  could  be 
used. 

3.2  The  General  Model. 

Bush  and  Hosteller  (1955)  considered  that  the  event  E^  had 
a  matrix  operator  T.  of  the  general  form 


u. 


'11, i     "12, i 


.  21,i     "22, i 


,   i  -  1,  2 


Applying  the  operator  T.  to  the  probability  vector  p   »  (p,q;. 


the  vector  T  p  is  obtained 


T^P 


"11, iP  "■  "12. i^ 


"21. iP  "^  "22. i^ 


The  probability  of  recalling  a  word  on  the  next  trial  after  event 
E.  occurs  is  u, ,  .p  +  u, ,  ,q,  whereas  the  probability  of  non- 
recall  is  u-   .p  +  u„„  ,q.   These  new  probabilities  must  sum  to 
one.   So 

("11. iP  •"  "12. i^^  ^    ("21. iP  ^  "22. i^^  "  ^ 


4>c; 


( 


"ll.i  ■"  "21.i^P  ■*■  ("12. i  ■  -22. i 


+  u_-  ,)q  =  1   . 


The  above  equation  must  hold  for  all  values  of  p  and  q  consistent 
with  the  condition  that  p  and  q  sum  to  unity,  and  so  in  particu- 
lar for  p  ■  1  and  q  «•  0 , 


"ll.i  -^  "21, i  "  1 


whereas  for  q  ■  1  and  p  »  0 


"12, i  +  "22, i  =  ^   • 


These  equations  mean  the  columns  of  the  matrix  T,  must  sum  to 
unity.   Letting  a.  »  \i^^    ^    and  b^  -  \x^^    ^    the  matrix  operator  T^ 
may  be  written  as 


1-b^       ^i 


1-a 


Applying  the  operator  T^  to  the  probability  vector  p  gives 


T,P  - 


(l-b,)p+a,q 


b^p+(l-a^)q 


i  «  1,  2   . 


Let  Q.p  and  Q  q  denote  the  first  and  second  element  of  vector 
T.p  respectively.   Letting  a^  =  l-aj,-b^,  a^  -  (l-a^)X^,  and  using 
the  fact  that  p  =  1-q  the  element  Q.p  may  be  written  as 
(3.2.1)        Q^p  =  a^p  +  (l-o^)X^ 

Bush  and  Mosteller  (1955)  have  shown  that  for  Q^p  to  be  between 
the  limits  of  zero  and  one  and  to  represent  learning   probabili- 
ties, then  0  £  o.  £1  and  0    <_   X       <_  1  must  hold.   Note,  that  Q^p 
is  the  probability  of  recalling  a  word  on  the  next  trial  after 

event  E,  has  occurred. 

1 

On  succeeding  trials  either  E.  or  E_  occurs.   The  occurrence 
of  E,  means  that  the  operator  Q.  must  be  applied  to  the  probabi- 


lity Q^p. 


Qi(QiP)  -  "^(QiP)  +  (l-a^)X^ 


aj_(a^p  +  (l-o^)X^)  +  (l-o^)X^ 


2         2 

a^p  +  (l-a^)X^ 


The  forms  of  Q.p  and  Q.p  suggests  the  general  form  for  any  number 
n  of  applications  is 


Using  mathematical  induction,  the  general  form  can  be  proven  to 
be  true.   Now,  when  a   is  less  than  unity,  a   tends  to  zero  as  n 
gets  large,  so 
(3.2.2) 


.n 


lira   Q . p  =  A . 
n ->■<«> 

3.3  Assumptions  Made  for  Free-Recall  Experiments. 

To  simplify  the  estimation  problem  of  the  parameters  Bush 
and  Hosteller  (1955)  made  certain  assumptions.   The  first  assump- 
tion made  was  that  the  probability  of  recalling  one  word  is  inde- 
pendent of  the  other  words.   The  second  assumption  made  was  that 
all  words  have  the  same  initial  probability  of  recall,  Pq.   The 
third  assumption  was  that  all  words  were  equally  difficult  to 
learn  and  the  position  on  the  list  was  irrelevant.   The  fourth 
assumption  made  was  that  the  non-recall  of  a  word  doesn't  change 
its  probability  of  being  recalled  on  the  next  trial.   The  fifth 
assumption  made  was  that  a  subject  could  learn  a  list  of  words 
perfectly. 

3.4  What  the  Assumptions  Mean  to  the  Model. 

Let  the  probability  that  the  i^h  word  is  recalled  on  trial 

n  be  p,   .   Now  given  that  the  ith  word  is  not  recalled  on  the 
'^i  ,n        ^ 

nth  trial  the  probability  of  recall  on  the  (n+1) th  trial  is  not 
changed,  by  the  fourth  assumption.   So  Q2,  which  is  applied  when 
E_  occurs,  must  be  the  identity  operator.   This  means  that  (3.2.1) 


becomes 
(3.4.1) 


Pi, n+1  "  ^2Pi.n  "    Pi,n 


f  > 


For  this  equation  to  be  of  this  form,  then  a^    must  be  equal  to 
one.   Using  the  last  assumption  and  (3.2.2),  then  X^^  =  1.   This 
means  that 

<3.^-2)         Pi,n+1  "    Vi,n  "  "l^Ln  "^  ^^""l^    ' 

Since  all  the  words  start  with  the  same  initial  probability 

of  recall,  then  any  words  that  have  been  recalled  exactly  k  times 

will  have  the  same  probability  Pj^  of  recall  on  the  next  trial. 

To  find  the  probability  of  recalling  a  word  after  k  recalls,  the 

operator  Q,  would  be  applied  k  times.   The  first  application 

yields 

Pi  =  ^1^0  '  "iPo  "^  ^^""l^ 
The  probability  after  two  recalls  is 

=  Oj^Iaj^PQ  +  (1-a^)]  +  (1-aj^) 

2    ^  /I   2. 
=  a^PQ  +  (l-a^ 

If  this  procedure  is  continued  k  times  the  result  obtained  would 
be 

(3.4.3)      pj^  =  Q^p  -  a^pQ  +  (1-a^) 

This  general  form  may  be  proven  to  be  correct  by  using  mathemati- 
cal induction. 

The  third  assumption  of  all  the  words  being  equally  difficult 
can  be  satisfied  very  easily  when  the  words  are  nonsense  syllables. 
Both  Glaze  (1928)  and  Kruger  (1934)  have  computed  meaningf ulness 
of  nonsense  syllables.   By  picking  out  syllables  that  are  equally 
meaningful  the  syllables  would  be  approximately  equal  in  difficulty. 
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When  using  monosyllables  the  difficulty  of  a  word  would  depend 
on  each  subject's  background.   There  is  no  criterion  that  can  be 
used  to  rate  monosyllables  on  difficulty.   This  does  not  mean  the 
model  cannot  be  used,  but  if  the  model  does  not  fit  the  data  very 
well  the  experimenter  should  be  aware  this  assumption  may  have 
been  incorrect.   Also  it  should  be  noted  that  this  assumption  im- 
plies that  primacy  and  recency  have  no  effect. 

The  fifth  assumption  means  that  the  subjects  are  given  as 
many  trials  as  they  need  in  order  to  learn  the  list. 
3.5  Estimation  of  the  Parameter  Pq. 

The  initial  trial  is  equivalent  to  N  binomial  trials  with  a 

probability  p^,  of  a  success,  where  N  is  the  total  number  of  words 

to  be  recalled.   Let  x   -  =  1  if  the  i^^  word  is  recalled  on  the 

X  ,  u 

initial  trial,  and  x.  _  =  0  if  it  is  not.   Using  Fryer  (1966), 

1  ,  u 


N 


(3.5. 


^>    Po  •-  i  J,    ^i,0 
1=1 


would  be  an  unbiased  maximum  likelihood  estimator  of  pQ  with 
variance 

(3.5.2)     a^(p.)  -  •  ° 


N 


For  a  quick  and  easy  way  to  obtain  an  estimate  of  pQ  the  above 
estimates  can  be  used. 

A  better  estimate  of  p„  can  be  made  by  using  more  informa- 
tion.  Since  the  non-recall  of  a  word  doesn't  change  its  probabil- 
ity of  being  recalled  on  the  trial,  the  data  for  each  word  can  be 
used  to  estimate  p^.   For  each  word  the  number  of  trials  preceded 


entirely  by  zero  recalls  can  be  obtained  from  the  data.   Let  Nq 
be  the  total  number  of  word  trials  which  are  preceded  entirely  by 
zero  recalls.   Using  Mood  and  Graybill  (1963)  the  probability  of 
obtaining  a  value  x  of  N-  can  be  found  from  the  negative  binomial 
distribution,  and  is  given  by 

To  maximize  the  likelihood  function  fCN^)  the  logarithm  can  be 
differentiated  with  respect  to  p^ ,  and  set  equal  to  zero. 


(Mi) 


L*  =  log  L  =  log 

8p       1-p     p 
'^o         o     o 


+  (Nq-N)  log  (1-Pjj)  +  N  log  p^ 


From  which  the  maximum  likelihood  estimate  of  p^  is  obtained  as 
(3.5.3)   p^  =  N/Nq   . 

This  estimate  is  not  unbiased,  but  Girshick,  Hosteller,  and 
Savage  (19A6)  have  shown  that  when  N  is  fixed  and  Nq  is  varied 

the  estimator 

N-1 


0    N  -1 
o 


is  unbiased.   For  large  N,  however  (3.5.3)  can  be  used.   Bush  and 
Hosteller  (1955)  showed  the  asymptotic  variance  of  (3.5.3)  to  be 


(3.5.4) 


a  (p^)  =  jj 


This  variance  is  smaller  than  the  variance  of  (3.5.2)  when  p   is 

less  than  one,  because  of  the  extra  p   term  in  (3.5.4). 

o 


10 


3.6  Estimate  of  the  Parameter  a^. 

After  n  trials  there  will  be  2   possible  and  different  se- 
quences of  recalls  and  non-recalls  of  a  word.   Let  q,    be  the 
probability  of  non-recall  of  a  word  in  the  kth  sequence  on  the 
nth  trial.   In  the  same  way  (3.4.1)  and  (3.4.2)  were  derived,  the 
probabilities  for  recall  and  non-recall  of  a  word  of  the  kth  se- 
quence on  trial  n+1  are  given  by 


^^iPk.n'^lPk.n-'^^-^l^ 


and 


Q2Pk,n"Pk.n 


Using  the  fact  Q.p,   -1-Q.q,     ,  then  the  above  equations  can  be 
rewritten  as 


Q.q,   "aiq, 
^l^k.n   l^k,n 


and 


^2'lk,n''lk.n    • 


A  word  in  the  kth  sequence  is  recalled  with  probability  1-q,   > 

iv  &  U 

and  if  the  word  is  recalled  on  trial  n  it  has  probability  a,q, 

^  ■'       l^k,n 

of  not  being  recalled  on  trial  n+1.   A  word  in  the  kth  sequence 

is  not  recalled  with  probability  q,     ,  and  if  the  word  is  not 

recalled  on  trial  n  it  has  probability  q,    of  not  being  recalled 

on  trial  n+1.   The  mean  value  of  q,   ,,  by  definition  is 

^k,n+l   -^ 


E(q 


k,n+l 


^  "    °'l^k.n<l-'lk,n^  +  ^k.n  ^k.n' 


To  find  the  mean  value  over  the  entire  population  of  words,  de- 
noted by  V^  n+1'  ^^^    ^^^  values  q,    are  summed,  each  weighted  by 
its  probability  of  occurence  Q,   .   So 

IV  y  XI 
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'2"  _ 

-  ^1  J,^k,nQk,n  ■"    (^-°'1>  I    ^k.n^k.n 

(3.6.1)  -  a^V^^^  +  (l-°'l>V2.n 

where  V^    is  the  second  moment  of  the  q,    values  about  the 
2  ,  n  K. ,  n 

origin.   Equation  (3.6.1)  does  not  give  an  exact  solution,  be- 
cause of  the  V^    term.   Bush  and  Hosteller  (1955)  using  several 
2  ,n 

approximations  found  that 

(3.6.2)  T^  -  Ed^)  =-117-   Ind-q^)   • 

where  Y,  is  the  mean  total  number  of  non-recalls.   Using  (3.6.2) 
gives 

(3.6.3)  o^  - 


By  counting  the  number  of  non-recalls  actually  made  in  the  ex- 
periment the  quantity  T-  can  be  estimated.   The  value  of  p^  can 
be  estimated  by  using  either  method  described  in  Section  3.5. 
Knowing  these  estimates  the  value  of  o.  can  be  estimated  quite 
easily  by  (3 . 6 . 3) . 

By  knowing  only  the  estimated  values  of  p   and  o^  the  data 
of  a  free  recall  experiment  can  easily  be  summarized  by  the  Bush 
and  Hosteller  model. 

Bush  and  Hosteller  (1955)  have  given  other  ways  to  estimate 
a^  for  special  cases,  i.e.  when  q   equals  one.   It  is  not 


12 


worthwhile  in  this  paper  to  consider  the  methods,  because  (3.6.3) 
can  be  used  for  the  special  cases,  and  the  amount  of  calculation 
to  apply  the  methods  is  greater  than  when  (3.6.3)  is  used. 

4.  The  Miller  and  McGill  Stochastic  Model. 


Miller  and  McGill  (1952)  developed  a  stochastic  model  that 
is  closely  related  to  the  linear  model.   Using  the  same  assump- 
tions that  were  used  in  the  linear  model  other  quantities  of 
interest  can  be  studied  by  using  the  Miller  and  McGill  model. 

4.1  Definitions  and  Terms  of  the  Model. 

Miller  and  McGill  (1952)  classified  words  according  to  which 
state  they  were  in,  where  a  word  which  had  been  recalled  exactly 
k  times  on  the  preceding  trials  was  said  to  be  in  state  A,  .   The 
probability  that  a  word  was  in  state  A,  on  trial  n  was  denoted  by 
p(Aj^,n). 

4.2  The  Difference  Equation  and  Its  Solution. 

A  word  can  get  into  state  A,  on  trial  n+1  in  only  two  ways. 

Either  a  word  is  in  state  A,  on  trial  n  and  it  is  not  recalled  on 

k 

trial  n+1,  or  the  word  is  in  state  A,  ,  on  trial  n  and  it  is  re- 

k-1 

called.   So  the  difference  equation  to  represent  this  is 
(4.2.1)    p(Aj^,n+l)  »  p(Aj^,n)(l-pj^)  +  ?  (\_if^)V]^_i 
where  p,  is  the  probability  that  a  word  will  be  recalled  after  k 
recalls,  and  is  given  by  (3.4.3). 

To  obtain  the  general  solution  of  (4.2.1),  the  system  of 
equations  of  (4.2.1)  is  written  in  matrix  form 


•f-.-  -  "   r* 
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(A. 2. 2) 


0 
0 


1-p. 


0 
0 

1-p. 


0 
0 
0 

1-p, 


p(AQ,n) 
p(A^,n) 
p(A2,n) 


■p(AQ,n+l)" 
p(A^,n+l) 
p(A2,n+l) 
p(A3,n+l) 


Let  T  be  the  first  matrix  in  (4.2.2),  the  matrix  of  transition 

probabilities.   Let  d   and  d  ^^  be  the  column  vectors  of  the  state 
*^  n       n+i 

probabilities  on  trial  n  and  trial  n+1  respectively,  also  in 
(4.2.2).   So  (4.2.2)  may  be  written  as 

Td   =  d  ^. 
n     n+1 

The  state  probabilities  on  trial  one  are  given  by  Td^  -  d^.   The 

state  probabilities  on  trial  two  are  given  by  Td^^  =  d^,    or 

Td,  =  T(Td  )  =  T^d   =  d„   . 
1        o        o     2 

Continuing  this  procedure,  it  is  apparent  that  the  state  proba- 
bilities on  trial  n  are  given  by 


T^d   =  d 
o     n 


By  Rao  (1965)  ,  the  semi-matrix  T  can  be  written  as 

(4.2.3)      T  =  S  D  S'-^ 

where  D  is  an  infinite  diagonal  matrix  with  the  same  elements  on 

its  diagonal  as  are  on  the  main  diagonal  of  T,  and  with  the  re- 


maining elements  being  zero.   So  T   may  be  written  as 
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T^  =  (SDS"''")  (SDS""*")  =  SD^S"-"- 


In  general,  then 


t"  =  SD^S"^ 


Since  D  is  a  diagonal  matrix,  D   is  obtained  by  taking  the  nth 
power  of  every  diagonal  element  of  D.   Rewrite  (A. 2. 3)  as 


TS  "  SD 


The  diagonal  elements  of  S  are  arbitrary,  so  let  S^. 


1. 


1-p   0    0 
"^o 

P   1-Pi  0 
•^o    *^1 


?!  ^"P2  • *  * 


21 


0 
1 


^31  ^32 


0.  .  . 
0.  .  . 
1..  . 


•      • 


21 


0 
1 


^31  ^32 


0.  .  . 

0.  .  . 

1.  .  . 


•      •  * 


1-p   0    0 
'^o 

0   1-p^  0 
0    0   1-p 


A      •   •   • 


The  S..  terms  can  be  solved  for  term  by  term.   The  matrix  S  turns 
out  to  be  equal  to 


Pl-Po 


PqPi 


(Pl-Po)(P2-Po^ 


P0P1P2 


P2-P1 


PiP 


1^2 


(Pl-P„)  (Po-P„)  (Po-P„)      (Po-Pi  )  (P-i-Pi  )     Pq-P 


'1  ''o'^*'2  ^'o''''3  ^o 


2  *-!/  VK3  t-i- 


'3  ''2 


Taking  the  inverse  of  S  gives  S    which  turns  out  to  be 
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Po-Pl 


P  Pt 


(Po-P2)(Pi-P2) 


P0P1P2 


0 
1 


Pi-P 


1  ""l 


P1P2 


0 
0 


(PQ-P3) (P1-P3) (P2-P3)    (P1-P3) (P2"P3)    P2"P3 


0.  .  . 
0.  .  . 

0.  .  . 

1... 


n 


So  the  first  column  of  T   turns  out  to  be 


(1-p,) 


n 


(I-P^)"    (1-Pi)" 
^Pl-Po^  "*"   Po-Pl 


PoPl 


(1-p^) 


(1-pl) 


n 


(I-P2) 


n 


(Pl-Po)(P2-Po^  ^    (Po-Pl)(P2-Pi)  "*"  <Po-P2^^Pl-P2^ 


The  reason  why  only  the  first  column  of  T   was  found  is  because 
d   is  just  the  column  vector  (1,0,0, ...)•   So,  T  d   involves  only 
the  first  column  of  T   and  thus  the  general  solution  of  (A. 2.1) 

l8 

(4.2.3)  p(A^,n)  =  (1-Pj,) 


n 


for  k  -  0 
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(1        )^ 
p(Aj^.n)  =  P^Pi-'-Pk-l  J^   ic"^^      ,   k  >  0 

n  (p. -p.) 
j-o  J 

Using  the  results  of  the  Bush  and  Hosteller  linear  model  the 

parameters  p   and  a,  can  be  estimated.   The  set  of  p,  ' s  can  be 
*^  o       1  «^ 

found  by  (3.4.3).   Substituting  for  the  Pj^'s  in  (4.2.3)  the  fol- 
lowing is  obtained. 

p(A^,n)  =  (I-P^)" 

(4.2.4) 

k-1  (l-(l-p^)ah(l-a^  "■) 

P(A,.n)  -  (1-p^)"   n  -^, i— 

1  =  0       ■^~°'i 

4.3  Expected  Number  of  Times  a  Word  is  Recalled. 

Let  E(k,n)  be  the  expected  number  of  times  a  word  is  re- 
called up  to  and  including  trial  n.   By  definition  E(k,n)  is 

n 

(4.3.1)  E(k,n)  =  I    kp(A  ,n)   . 

k=0     ^ 

Let  r  , 1  be  the  expected  number  of  words  recalled  on  trial  n+1. 

n+1  '^ 

Bydefinitionr,-is  .' 

•'  n+1 

(4.3.2)  r^^^   =    E(k,n+1)    -    E(k,n)       . 

n+1  n  ' 

or  T^^^    =      I    kp(A^,n+l)    -       [    kp(Aj^,n)       . 

k=0  k=0 

Using  (4.2.2)  the  first  summation  is  rewritten  so  that 

n  n+1  n 

'^n+l  "      ^    ^P(Aj^,n)  (1-pj^)  +  I    kp(\_i.n)Pk_i  "  I    kp(Aj^,n)  . 
k=0  k=l  k»0 

n  n  n 

***    'n+1  "   ^  kp(Aj^,n)  -  I    kp(Aj^,n)pj^  +  I    (k+l)p(A  ,n)p 
k=0  k»0  k"0 

n 
-  I    kp(A,  ,n) 
k=0     ^ 


-r--^ 
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n  n 

^°'   '^n+l  "   ^  (k+l)p(Aj^,n)pj^  -  I    kp(Aj^,n) 
k=0  k<»0 


or   r 


n+1 


I  P;jp(Aj^,n) 


k;=0 


The  Bush  and  Hosteller  model  is  used  more  to  summarize  the 
data,  and  not  for  prediction.  Miller  and  McGill's  model  can  be 
used  for  prediction.  By  comparing  the  predictions  made  and  the 
data  it  can  be  seen  how  well  the  model  works. 


5.  The  Waugh  and  Smith  Stochastic  Model 

Waugh  and  Smith's  (1962)  model  uses  a  Markovian  process  with 
an  absorbing  state.   For  a  general  discussion  of  Markovian  models 
in  psychology  see  Miller  (1952),  Kao  (1953),  and  Goodman  (1953). 
5.1  Definitions  and  Terms. 

Waugh  and  Smith  (1962)  defined  three  processes  that  were 
named  labeling,  selecting,  and  fixing.   The  process  of  labeling 
was  equivalent  to  a  word  acquiring  a  mnemonic  tag.   For  a  word 
to  be  recalled  it  must  be  labeled,  but  if  a  word  is  labeled  it 
doesn't  mean  the  word  will  be  recalled.   Labeling  occurs  with 
probability  X  on  any  trial,  and  is  irreversible.   In  other  words, 
once  a  word  is  labeled  it  stays  labeled.   The  second  process  of 
selecting  is  equivalent  to  rehearsing  a  word.   Selecting  a  word 
is  assumed  to  occur  with  probability  a  on  each  trial.   For  a 
word  to  be  recalled  for  the  first  time  on  a  given  trial  the  word 
must  have  been  labeled  on  that  trial  or  on  some  previous  trial, 
and  it  must  be  selected  on  that  trial.   The  third  process,  fixing 
a  word,  is  assumed  to  occur  with  probability  (j)  on  any  trial  in 
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which  a  word  is  recalled.   Once  a  word  is  fixed  it  is  recalled 
on  every  subsequent  trial.   If  it  is  not  fixed  the  word  is  for- 
gotten, and  the  word  must  be  selected  again  with  probability  a. 
5.2  States  of  the  Waugh  and  Smith  Model. 

A  word  may  be  in  any  one  of  five  states  after  a  given  trial. 
In  state  one  the  word  hasn't  been  labeled  yet.   A  word  in  state 
two  has  been  labeled,  but  not  selected  yet.   For  state  three  the 
word  has  been  labeled  and  selected,  but  not  as  yet  fixed.   In 
state  three  the  word  was  recalled,  because  it  had  been  labeled 
and  selected.   In  state  four  the  word  has  been  recalled  but  not 
fixed  on  some  previous  trial,  and  it  was  not  selected  on  the  given 
trial.   A  word  in  state  five  has  been  fixed.   State  five  is  an 
absorbing  state.   The  trials  are  continued  until  perfect  reten- 
tion is  obtained. 

Let  P   i  be  the  probability  of  a  word  being  in  state  j  on 
trial  n.   By  considering  how  a  word  can  get  to  one  state  from 
other  states  the  following  equations  may  be  written. 


n,l 


n,2 


n,3 


n,4 


n,5 


(l-X)P 


n-1,1 


(^-°>^n-1.2-^^(^-^>Vl,l 

^(^-*>(Vl.2-^^n-1.3^Vl.4>-^^^<^-*>Vl.l 
(^-''><Vl.3-^Vl,4> 
^n-l,5-^''*^Vl,2+Vl,3^Vl,4>^^'='*Vl.l 


The  system  of  equations  may  be  written  in  matrix  notation. 
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(1-X)  0  0       0  0 

A(l-a)  1-0  0       0  0 

aA(l-<^)  a(l-(f.)  a(l-<())  a(l-(j))  0 

0  0  1-a  1-0  0 

0(J)X  o<t)  o<J)  o<{)  1 


^n-1,1 

[^,il 

^n-1,2 

^.2 

^n-1,3 

« 

^.3 

^n-1,4 

^.4 

.^n-1,5. 

k.sj 

Let,  as  before  in  Section  (4.2),  T  be  the  transpose  of  the  matrix 

of  transition  probabilities,  and  d   denote  the  column  vector  of 

'^  n 

state  probabilities  on  trial  n.   Using  the  same  method  as  in  (A. 2) 

then 

T^'d   -  d 
o     n 

The  initial  vector  d'  is  the  vector  (1,0,0,0,0),  because  all  of 

the  words  start  in  state  one.   Therefore,  only  the  first  column 

of  the  matrix  T   needs  to  be  found  to  find  the  elements  of  the 

vector  d  .   If  T  is  multiplied  by  itself  a  few  times  a  pattern 

soon  develops.   The  elements  of  the  first  column  of  T   can  be 

written  by  comparing  terms.   Thus,  the  elements  of  d   turn  out  to 

n 

be 


P„  ,  -  (1-X)' 
n,l 


'=•^•1)   ^n.3  "TI^  <  (l-o»)"-  <1-X)»  ) 


^,4  -  W^  a-°*)"^  a-o)-"^'  -  llUViWy''  <^-^>" 


■n,5  ■  l-a-'*> 


-  ^^(  (l-0(j))     -  (1-X)     ) 


X-o4) 
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-  Once  a  word  has  been  recalled  for  the  first  time  it  can 
never  return  to  state  one  or  state  two.   The  probability  R.  ,  of 
a  word  being  in  state  k  after  j  trials  from  the  first  recall 
trial  is  given  by  the  matrix  equation 


(5.2.2) 


o(l-({))   a(l-(j))    0 
1-0       1-0  0 

acj)       a({i      1 


R 


R 


j-1,3 
j-1,4 


R 


J  L  3-1.5 


\.^] 

•» 

"3,4 

h.5j 

Let  Q  be  the  first  matrix  in  (5.2.2)  the  transpose  of  the 

matrix  of  transition  probabilities.   Let  S.  and  S,,,  be  the 

column  vectors  of  the  state  probabilities  on  trial  j  and  j+1 

respectively,  also  in  (5.2.2).   Then  using  the  same  procedure  as 

in  Section  (4.2) 

Q^S   =  S.     . 
o     J 

The  S   is  equal  to  (l-<fi,  0,  ((>)',  because  a  proportion  (f)  of  the 
words  are  fixed  on  the  trial  on  which  they  are  first  recalled, 
while  a  proportion  1-(J)  are  selected  but  not  fixed.   Those  se- 
lected but  not  fixed  go  into  state  three.   If  Q  is  multiplied  by 
itself  a  few  times  a  pattern  soon  develops.   Using  this  pattern 
the  elements  of  Q-*  can  be  easily  found. 


*a(l-4.)(l-0(j))J"-'-    aa-<^)a-o^)^'-^ 


(l-a)(l-a(j)) 
l-(l-a({.)^ 


J-1 


(l-a)(l-a({.) 
l-(l-a<l.)^ 


J-1 


o" 

"1-4.' 

h.3i 

0 

0 

as 

"3.4 

1. 

.   *    . 

L^j.J 

Therefore,  the  solution  for  R   ,  is 

J  >  •^ 
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(5.2.3)     R,  A  -  (l-oXl-^Xl-a*)^'-"- 

R   c  -  1  -  (I-*) (1-0*)^ 
J  »  -*  ■  •  • 

The  probability  R.  that  a  word  will  be  recalled  after  j 
trials  from  the  first  recall  is  /       ^ 


Using  (5.2.3),  then  R   is 

(5.2.4)     R   -  1  -  (l-a)(l-(j))(l-a(|.)^"-'-   . 

5.3  Estimation  of  the  Parameters. 

Let  the  probability  of  first  recall  on  trial  n  be  F^.   This 

probability  is  found  by  considering  Pr(lst  recall  by  nth  trial) 

»  Prdst  recall  on  n*'*trial  or  1st  recall  by  the  (n-l)th  trial). 

This  statement  may  be  rewritten  as  Pr(lst  recall  by  nth  trial)  - 

Pr(lst  recall  on  trial  n)  +  Pr(lst  recall  by  the  (n-l)th  trial), 

or  Prdst  recall  on  trial  n)  »  Pr(lst  recall  by  nth  trial)  - 

Prdst  recall  by  the  (n-l)th  trial)  -  Pr(not  yet  recalled  by  the 

(n-l)th  trial)  -  Pr(not  yet  recalled  by  nth  trial).   Thus, 

F   =P  ^    r,    +   ?      ,,-P   --P^^or  using  (5.2.1)  F   is  equal 
n     n-1,2     n-1,1     n,2     n,l  n 

to 

(5.3.1)  F    "  :rT  (  <i-^)"  -  <i-«^)"  )     • 

n    a- A 

The  F   can  be  estimated  from  the  data  for  various  values  of  n. 
n 

Let  x^    ■  0  if  the  ith  word  is  not  recalled  on  trial  n,  or  if  it 
i,n 

has  been  recalled  before  the  given  trial.   Let  x,    -  1  if  the 

i  ,n 
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first  recall  of  the  word  occurs  on  trial  n.   Then  F^  would  be 
estimated  by 

N 


^n  "  N  ,^/i,n   * 
i""l 


The  parameters  o  and  X  are  estimated  by  finding  the  minimum 

chi-square  estimates  of  a    and  X  that  best  fit  the  data  of  (5.3.1), 

Let  R   be  the  probability  of  recall  on  trial  n.   When  a  word 
n         "^ 

is  recalled  on  a  trial  it  must  be  in  either  state  three  or  state 
five.   Therefore,  R   is  given  by 

R   =  P   c  +  P   -J 
n     n ,5     n  ,  3 

If  (5.2.1)  is  used,  then  R   can  be  written  as 

(5.3.2)    R   =  1  -  (1-0(1.)''  -  4^^  (  a-o^)""    -    (1-X)"  )   . 
n  A-o  9 

The  quantity  R   can  be  estimated  from  the  data  by 
n 

1   N 
R 


I   =■  -   y  y 


where  y.    =  1  if  the  ith  word  is  recalled  on  trial  n,  and  y.  „  "  ( 
■^  i,n  1  ,n 

if  it  is  not  recalled.   Using  the  estimated  values  of  a,    X,  and  R 
the  least-squares  estimate  of  (j)  is  found.   This  is  the  estimate 
of  (j)  that  is  used  in  the  model. 

6.  The  Cowan  Stochastic  Model 

The  Cowan  model  is  unlike  the  previous  models  discussed,  be- 
cause this  model  considers  the  effect  of  associative  connections 
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between  words  in  recalling  them.   Bousfield  (1953)  showed  that 
there  is  a  tendency  for  some  words  to  cluster  together  when  the 
list  is  recalled.   When  a  certain  type  of  word  is  recalled  the 
remaining  words  that  have  a  high  association  value  to  the  word 
recalled  are  likely  to  have  a  higher  probability  of  recall  than 
words  of  lower  association  strength.   The  entire  list  is  given  a 
number  of  times,  each  time  in  a  different  order.   After  a  number 
of  presentations  the  subject  is  asked  to  recall  as  many  words  as 
he  can.   The  Cowan  model  predicts  the  kinds  of  words  that  will 
appear  in  a  given  recall  position.  , 

6.1  Definitions  and  Terms. 

Cowan  (1966)  considers  a  list  of  stimulus  words  that  could 
be  divided  into  two  groups.   One  group  is  denoted  as  Category  C^^ 
and  the  other  as  Category  C„.       An  example  would  be  if  C^  consisted 
of  tree  names,  and  C-  consisted  of  words  that  were  selected  ran- 
domly.  The  strength  of  C,  or  C2  is  defined  in  terms  of  the  asso- 
ciative connections  which  exist  between  its  members. 

There  are  four  sets  of  associative  interconnections.   There 
are  two  within-category  associations,  (C^-*-C^)  and  (C2-*C2)  •   There 
are  also  the  between-category  associations,  ^^i'*'^2^    ^^^    (C2-»'Cj^)  . 
If  the  first  word  recalled  is  a  C2  word,  then  the  probability  of 
recalling  a  C^  word  next  would  be 


nic^-^c^) 


^^^l'^2^  -  M(C,-»-C,)  +  M(C,-C,) 


'2   1' 


'2  "2 
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where  M(»)  is  a  measure  of  association  of  (•)•   The  probabilities 
P(C2iC  ),  P(C  |C  )  and  P(C2|C2)  would  be  defined  similarly. 
6.2  Estimation  of  Association  Strengths. 

A  method  suggested  by  Pollio  (1963)  to  measure  the  within- 
category  and  the  between-category  association  strengths  may  be 
used.   The  method  used  is  to  set  up  matrices  of  C^xC^,  C2XC2,  and 
C,xC„.   In  each  cell  the  association  strength  between  the  corre- 
sponding words  is  entered.   Let  c^  and  c^  be  the  total  number  of 
words  in  C^  and  C-  respectively.   Let  C^(i)  be  the  ith  word  in 
C. .   The  association  strengths  for  selected  word  lists  can  be 
found  in  Palemo  and  Jenkins  (1964)  . 


C^(2) 


C,(c,) 


C^(l)   C^(2) 


C3_(c,) 


IZ-a. 


The  sum  of  the  entries  of  the  C^xC^,  C2XC2 ,  and  C^xC2  matrices 
are  symbolized  by  a.,  a,,  and  8  respectively.   The  mean  associa- 
tion value  between  any  C,  word  occurring  first  in  recall  and  the 
remaining  C,  words  is  given  by  a  /c.,  where  c.  represents  the 
total  number  of  words  in  the  C^  category.   Similarly  the  mean 
associative  value  estimate  of  a  C^  word  leading  to  another  C2 
would  be  a^/cj.   For  a  C^  word  leading  to  a  Cj    word  the  estimate 
would  be  S/c^,  and  for  a  C^  word  leading  to  a  C^^  would  be  ^Ic^^, 
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In  this  model  the  words  within  a  category  are  assumed  to  be  in- 
distinguishable from  each  other.   Thus,  the  association  between 
any  pair  in  a  category  is  the  same  as  any  other  pair  and  the  mean 
association  strength  is  used  to  estimate  this  value. 

The  probability  of  a  C   word  on  first  recall  on  a  given 
trial  followed  by  another  C^^  word  would  be 

(6.2.1)   P(Cj^lC^)  =  M(C^-*C^)+M(C^-»-C2)  "  a^/c^+g/c^  "  a^+g   * 


The  probabilities  P(Cj^|C2),  P(C2|C2),  and  PCC^Ic^^)  would  be  de- 
fined similarly. 
6.3  The  Non-Markovian  Process  of  the  Cowan  Model. 

The  probabilities  of  (6.2.1)  change  on  the  next  recall  of 
the  same  trial,  because  once  a  word  has  been  recalled  it  will 
not  be  a  possible  response  for  the  next  word  recalled.   Thus,  the 
within-category  mean  association  value  and  the  between-category 
mean  association  value  are  reduced.   Since  it  was  assumed  that 
all  words  are  indistinguishable  in  a  category,  the  association 
strength  between  each  pair  is  equal  in  value  to  the  association 
strength  between  every  other  pair. 

The  mean  association  strength  between  each  pair  in  C^  is 
given  by 

(6.3.1)         a^/c^(c^-l)   . 
So,  the  new  within-category  mean  association  value  encountered  by 


the  second  C^  word  would  be 


Cj^(c^-l) 


or 


c^-c^ 


(c^-2) 
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Each  time  another  word  from  C^  is  recalled  the  mean  within- 
association  value  is  reduced  by  the  amount  given  by  (6.3.1). 
When  a  total  of  r,  words  from  C^^  have  been  recalled  the  mean 
association  value  encountered  by  the  next  C^^  word  is  given  by 

"l 

(6.3.2)  M(C^-»-Cj^)  -  — U^-r^-l)       . 

c^-c^ 

Similarly,  when  a  total  of  x^  words  from  C^  have  been  recalled 
the  mean  within-association  value  left  for  the  next  C^  word  is 
given  by 

(6.3.3)  M(C2-*-C2)  "   — (^2^2-1)   . 

'^2-'^2 

The  mean  between-association  strength  for  a  C^^  word  and  a 
Cj  word  would  be  given  by 


(6.3.4) 


c^C2 


Each  time  a  C^  word  is  recalled  the  mean  association  strength  is 
reduced  by  the  amount  given  in  (6.3.4).   So,  after  r2  words  are 
recalled  from  C„  the  remaining  mean  association  strength  left  for 
the  next  C^  word  is  given  by 

(6.3.5)        M(C^-C2)  -  -^  (c2-r2)   • 

Similarly  for  r^  words  recalled  from  C^^  the  mean  association 
strength  of  a  €„  leading  to  a  C   word  would  be  equal  to 
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(6.3.6) 


M(C2-^Cj^) 


— ^  (c,-r,)   . 
C3_C2    1   1 


Thus,  using  the  above  equations  for  M(.)  and  the  equations 
for  P(. ) ,  then 


.   P(C,  Ic,)  - 


a^C2(Cj^-rj^-l) 


ll^l^  "  aj^C2(c^-r^-l)  +  B(Cj^-l)(c2-r2) 


e(c^-l) (c2-r2) 


b.   P(C2|C^)  -  a^C2(c3^-r3^-l)+B(c2-l)(c2-r2) 


(6.3.7) 

c.   P(C2|C2) 


a2C^(c2-r2-l) 


a2C^(c2-r2-l)+6(c2-l)  (Cj^-rj^) 


e(c2-i) (ci-r^) 


d.   P(C^|C2)  -  a2C^(c2-r2-l)+6(c2-l)(c^-r^) 


The  process  can  be  in  two  states,  C^^  or  C^.      The  transitional 
probabilities  are  functions  of  the  number  of  each  type  of  word 
recalled,  and  so  this  is  a  Non-Markovian  process. 
6.4  The  Transition  Matrix. 

By  redefining  the  states  to  represent  the  type  of  word  and 
the  number  of  words  of  each  type  recalled,  by  Feller  (1957),  the 
process  can  be  treated  as  a  Markov  chain.   Let  C^(m,n)  be  the 
state  in  which  a  C.  word  has  just  been  given  with  m  C^^  words  and 
n  C»  words  previously  recalled. 

The  probabilities  for  the  transition  matrix  are  found  from 
(6.3.7).   For  example,  consider  the  probability  of  going  from 
state  C^(i,n)  to  state  C^(i+l,n).   Using  (6.3. 7)a.  with  r^  -  i 
the  P(C^(i,n)  |Cj^(i+l,n))  is  calculated  and  substituted  into  the 
transition  matrix. 
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The  transition  matrix  P  can  be  arranged  into  sets,  such  that 
when  one  set  is  reached  the  process  cannot  enter  the  states 
located  below  it,  and  after  a  set  has  been  entered  it  is  immed- 
iately left.   All  sets  then  are  transient  except  the  set  repre- 
senting complete  recall,  and  those  states  would  be  absorbing.   An 
example  of  how  matrix  P  would  be  arranged  is  given  in  Table  1. 
These  sets  contain  all  the  possible  states  involved  in  recall  of 
lengths  denoted  by  the  set  number.   The  states  are  numbered  to 
conserve  space  and  divided  into  sets  labeled  I,  II,  III,  etc. 
Matrices  of  the  form  in  Table  1  are  submatrices  of  the  matrix  P. 
Let  Q  be  any  submatrix  formed  this  way.   The  sets  in  matrix  Q  are 
transient.   Kemeny  and  Snell  (1960)  have  proved  a  matrix  H  which 
gives  the  probabilities  that  a  process  will  ever  go  from  any 
transient  state  to  any  other  transient  state  is  given  by 

,-1 


H  =  (N-I)  N 


dg 


-1 


where  N  »  (I-Q)   ,  and  N    is  a  diagonal  matrix  whose  elements  are 

the  same  as  the  diagonal  elements  of  N.   The  matrix  Q  has  only 

non-zero  elements  below  the  diagonal.   So  the  matrix  N  would  have 

ones  on  the  diagonal.   Thus  N~   would  be  the  identity  matrix,  so 

*  dg 


(6.4.1) 


H  =  (I-Q)"-"-  -  I. 


For  example,  the  probability  of  starting  in  state  C^(0,0) 
and  ending  in  state  C»(3,2)  in  the  sixth  recall  position  can  be^ 
found  in  the  matrix  H.   The  matrix  Q  would  include  the  sets  I 
through  VI. 
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TABLE  1 

TRANSITION  MATRIX  P  FOR  THE  FIRST  FOUR  WORDS  RECALLED 
(X's  signify  nonzero  entries) 


state    No. 

S  tat es 

State 
No. 

IV 

Ill 

II 

I 

12345678 

12    3    4    5    6 

12    3    4 

1    2 

IV 

^1(3,0) 

1 

^2(3,0) 

2 

^1(2,1) 

3 

^2(2,1) 

4 

^1(1.2) 

5 

^2(1,2) 

6 

• 

- 

^1(0,3) 

7 

^2(0,3) 

8 
III 

■/ 

^1(2,0) 

1 

X   X 

^2(2,0) 

2 

X   X 

, 

^^1(1,1) 

3 

X    X 

- 

C     . 

4 

X    X 

''2(1,1) 

^1(0,2) 

5 

X   X 

C 

6 

X    X 

''2(0,2) 

II 

r 

1 

X    X 

^^1(1,0) 

C 

2 

X    X 

''2(1,0) 

c 

3 

X    X 

''1(0,1) 

c 

4 

X    X 

''2(0,1) 

I 

C     , 

1 

X    X 

''1(0,0) 

c    , 

2 

X    X 

'^2(0.0) 
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6.5  Adjustments  Made  for  the  Model. 

When  the  set  of  data  and  the  model  were  compared  it  was 
found  necessary  to  make  some  adjustments.   It  was  found  by 
Cowan  that  a  better  fit  was  obtained  by  using  c,  and  c^^    as  the 
mean  number  of  C^  and  C„  words  recalled  respectively.   Cowan 
thought  the  reason  for  having  to  redefine  c^  and  c^   was  because 
the  subject  received,  organized,  and  recalled  completely  only  a 
limited  number  of  items  on  the  list.   Finally,  o.^    was  made  free 
and  a  family  of  curves  were  generated.   The  value  of  a^  was  picked 
which  gave  the  best  fit  to  the  data.   Gofer  and  Reicher  (1964), 
and  Puff  (1964)  demonstrated  that  when  words  in  a  category  appear 
together  in  the  list  presented,  they  will  tend  to  appear  together 
in  recall.   Thus,  the  occurrence  of  items  together  in  the  list 
might  increase  the  association  between  them,  and  this  would  in- 
crease the  value  of  a. 

7.  Summary 

Using  (3.5.3)  and  (3.6.3)  the  values  of  p   and  a^  can  be 
estimated.   Knowing  only  these  two  values  the  data  of  a  free  re- 
call experiment  can  be  summarized  by  Bush  and  Mosteller's  model. 
Knowing  the  estimates  of  a,  and  p   the  probability  p,  of  recalling 
a  word  after  k  recalls  can  be  found  by  using  (3.4.3)  of  the  linear 
model. 

Using  Miller  and  McGill's  model  and  the  values  of  p   and  a^ 
estimated  by  the  Bush  and  Mosteller  model  an  experimenter  can 
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find  the  probability  of  a  word  being  recalled  exactly  k  times  on 
trial  n  by  using  (4.2.4).   Using  Miller  and  McGill's  model  the 
expected  number  of  times  a  word  is  recalled  up  to  and  including 
trial  n  can  be  found  by  using  (4.3.1).   Another  quantity  of 
interest  is  the  expected  number  of  words  recalled  on  trial  n+1, 
which  can  be  found  by  using  (4.3.2). 

If  the  process  of  labeling,  selecting,  and  fixing  of  a  word 
are  considered,  the  Waugh  and  Smith  model  may  be  used.   With 
their  model  the  probability  R.  that  a  word  will  be  recalled  after 
j  trials  from  the  first  recall  can  be  found  by  using  (5.2.4). 
By  estimating  the  probability  F   of  first  recall  on  the  nth  trial 
the  values  of  a    and  X  can  be  found  by  using  the  best  minimum  chi- 
square  fit  to  (5.3.1).   By  estimating  the  probability  R   of  re- 
calling a  word  on  trial  n  from  the  data  and  using  the  estimates 
of  0  and  X,  the  least-squares  estimate  of  (j>  is  found  using 
(5.3.2).   If  the  values  of  a,    X,  and  (}>  are  already  known,  say 
from  a  previous  and  similar  experiment,  the  probability  of  first 
recall  on  trial  n  and  the  probability  of  recall  on  trial  n  can  be 
found  by  (5.3.1)  and  (5.3.2)  respectively. 

Cowan's  model  is  used  when  an  experimenter  wishes  to  con- 
sider the  effect  of  associations  between  words.   The  model  is 
limited  to  the  case  where  the  words  in  a  list  can  be  put  into  two 
categories.   Once  a  measure  of  association  is  found  between  cate- 
gories or  within  categories  various  probabilities  can  be  found. 
Using  (6.3.7)  the  probability  of  a  word  from  a  category  following 
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a  word  from  the  same  category  or  the  other  category  can  be  found 
given  that  r.  words  have  been  recalled  from  C^  and  r^    words  from 
C^.   A  transition  matrix  can  be  formed  by  letting  the  states 
represent  the  type  of  word  and  the  number  of  words  of  each  type 
recalled.   Using  (6.4.1)  the  probability  of  starting  in  a  state 
and  ending  in  a  certain  state  can  be  found. 

By  comparing  the  predictions  of  the  models  and  the  data  ob- 
tain the  experimenter  can  determine  which  model  best  fits  his 
experiment.   With  the  parameter  values  known  the  data  can  be 
summarized.   Individual  subjects  can  be  compared  easily,  and  the 
effects  of  changing  the  number  of  words  in  the  list  or  speed  of 
presentation  of  words  can  be  measured  readily  in  terms  of  the 
parameters. 
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ABSTRACT 

The  purpose  of  this  paper  is  to  show  four  stochastic  models 
used  to  represent  the  data  of  a  free  recall  experiment.   The 
free-recall  experiment  for  the  first  three  models  considered  in 
this  paper  is  one  in  which  a  subject  is  given  as  many  trials  as 
necessary  to  completely  learn  a  list  of  words.   In  the  last  model 
considered  the  subject  may  only  partially  learn  the  list  of  words. 

The  first  model  considered  is  the  Bush  and  Mosteller  linear 
model.  Changes  in  the  probabilities  of  recall  or  non-recall  are 
described  with  the  aid  of  linear  operators.  By  knowing  only  two 
parameters  the  data  of  a  free-recall  experiment  can  be  summarized. 

The  next  model  considered  is  the  Miller  and  McGill  model. 
Their  model  is  closely  related  to  Bush  and  Hosteller's  model. 
Using  the  estimates  of  Bush  and  Hosteller's  model  in  Miller  and 
McGill's  model  the  probability  of  recalling  a  word  exactly  k  times 
in  n  trials,  and  the  expected  number  of  times  a  word  is  recalled 
in  n  trials  can  be  found. 

The  third  model  discussed  is  Waugh  and  Smith's  stochastic 
model.   The  model  describes  a  Markov  process  with  a  realizable 
absorbing  state,  allowing  complete  learning  on  some  finite  trial 
as  well  as  imperfect  retention  prior  to  this  trial. 

The  last  model  considered  is  the  Cowan  model.   This  model 
considers  the  effect  of  associations  between  words  that  will 
appear  in  a  given  recall  position.   The  recall  of  words  is  re- 
garded as  a  Markov  chain  where  the  category  of  the  recalled  word 


is  determined  by  the  kind  of  word  preceding  it.   Three  parameters 
are  used  which  are  based  on  associative  measures  of  between  and 
within  categories  of  stimulus  words. 


